Skip to content

🐛 Fix multi-gpu error (AttributeError) for kongnet and nucleus_detector#1074

Open
gozdeg wants to merge 1 commit into
developfrom
dataparallel-err-fix
Open

🐛 Fix multi-gpu error (AttributeError) for kongnet and nucleus_detector#1074
gozdeg wants to merge 1 commit into
developfrom
dataparallel-err-fix

Conversation

@gozdeg
Copy link
Copy Markdown
Collaborator

@gozdeg gozdeg commented Jun 4, 2026

Follow-up to the earlier multi-GPU fix. kongnet and nucleus_detector were missed and still crash on multi-GPU machines with:

AttributeError: 'DataParallel' object has no attribute 'target_channels'

When multi-gpu, the model is wrapped in nn.DataParallel / DistributedDataParallel, so these need to go through the _get_model_attr() wrapper helper function (or a similar approach) which unwraps the module before reading the attribute.

Changes

  • nucleus_detector.py: read min_distance / tile_shape via the _get_model_attr helper instead of directly off self.model.
  • kongnet.py: unwrap the module before reading target_channels in infer_batch. This can't reach the helper function, so it applies the same unwrap inline.

@gozdeg gozdeg requested review from Jiaqi-Lv and shaneahmed June 4, 2026 15:05
@shaneahmed shaneahmed added this to the Release 2.1.1 milestone Jun 4, 2026
with torch.inference_mode():
logits = model(imgs)
target_logits = logits[:, model.target_channels, :, :]
target_logits = logits[:, target_channels, :, :]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not _get_model_attr?

Copy link
Copy Markdown
Collaborator Author

@gozdeg gozdeg Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shaneahmed _get_model_attr() was defined in engineABC, I don't think we can reach it from here

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try this?

if hasattr(model, "target_channels"):
  target_channels = model.target_channels
elif hasattr(model.module, "target_channels"):
  target_channels = model.module.target_channels
else:
  raise AttributeError:

@gozdeg gozdeg changed the title 🐛 Fix multi-gpu error (DataParallel Err) for kongnet and nucleus_detector 🐛 Fix multi-gpu error (AttributeError) for kongnet and nucleus_detector Jun 4, 2026
@Jiaqi-Lv Jiaqi-Lv requested a review from Copilot June 5, 2026 10:09
@shaneahmed shaneahmed added the bug Something isn't working label Jun 5, 2026
@shaneahmed shaneahmed changed the title 🐛 Fix multi-gpu error (AttributeError) for kongnet and nucleus_detector 🐛 Fix multi-gpu error (AttributeError) for kongnet and nucleus_detector Jun 5, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a multi-GPU crash in TIAToolbox nucleus detection paths by ensuring model attributes are read safely when the model is wrapped by nn.DataParallel / DistributedDataParallel.

Changes:

  • Update NucleusDetector.post_process_wsi to read min_distance and tile_shape via _get_model_attr() instead of directly from self.model.
  • Update KongNet.infer_batch to resolve target_channels when model is wrapped, avoiding AttributeError under multi-GPU wrappers.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
tiatoolbox/models/engine/nucleus_detector.py Uses the existing unwrapping helper to access model attributes under DP/DDP.
tiatoolbox/models/architecture/kongnet.py Adds wrapper-aware access to target_channels so inference works under DP/DDP.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 431 to 435
# min_distance and postproc_tile_shape cannot be None here
min_distance = kwargs.get("min_distance")
if min_distance is None:
min_distance = self.model.min_distance
min_distance = self._get_model_attr("min_distance")
tile_shape = kwargs.get("tile_shape")
Comment on lines +865 to +869
try:
target_channels = model.target_channels
except AttributeError:
target_channels = model.module.target_channels

Comment on lines +865 to +872
try:
target_channels = model.target_channels
except AttributeError:
target_channels = model.module.target_channels

with torch.inference_mode():
logits = model(imgs)
target_logits = logits[:, model.target_channels, :, :]
target_logits = logits[:, target_channels, :, :]
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 5, 2026

Codecov Report

❌ Patch coverage is 71.42857% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 99.87%. Comparing base (c9c72c9) to head (186b27c).

Files with missing lines Patch % Lines
tiatoolbox/models/architecture/kongnet.py 60.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1074      +/-   ##
===========================================
- Coverage    99.88%   99.87%   -0.02%     
===========================================
  Files           85       85              
  Lines        11626    11630       +4     
  Branches      1524     1524              
===========================================
+ Hits         11613    11615       +2     
- Misses           7        9       +2     
  Partials         6        6              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants